<scp> <i>ENphylo</i> </scp> : A new method to model the distribution of extremely rare species
نویسندگان
چکیده
Species distribution models (SDMs) algorithms are powerful tools for predicting species distributions across the landscape under hypothesis that environmental conditions influence species' geography (Elith & Leathwick, 2009). In last decades, SDMs have been widely applied to assess effect of climate change on and impact invasive species, select suitable sites reintroductions, or address conservation objectives (Barbet-Massin et al., 2018; Fois 2018). Although a number different successfully used calibrate SDMs, their applicability is limited by several factors, including intrinsic characteristics (e.g. range size, dispersal ability) methodological issues (Fourcade Tessarolo 2021). Most, if not all, SDM fail apply scanty occurrence records, as with rare at modelling extinct sparse fossil records (Raia 2020; Svenning 2011; Varela 2011). Several studies demonstrated low sample size impacts negatively accuracy (Jiménez-Valverde, Pearman 2008; Santini It has repeatedly suggested more than 20, ideally 50 geographical occurrences necessary provide robust predictions (Santini 2021; Wisz 2008). Thus, although most demanding in terms reliable estimates (since they exposed extinction risk, Blomqvist 2010; Eaton 2018), also those where perform worst (Breiner 2015, Lomba Sousa-Silva 2014). The same applies which disappointing because inclusion information living allows approaching fundamental niche (Maiorano 2013; Raia Timmermann 2022) genuine evidence past response (Di Febbraro 2017; Mondanaro Tóth 2019). main problem few usually coupled numerous explanatory variables, causing strong imbalance between poor actual preferences rich information. likely causes model overfitting, which, turn, reduces transferability (Vaughan Ormerod, 2005). A possible solution proposed literature fit bivariate (i.e. two variables time) then averaging within weighted ensemble (Lomba 2010). viable effective this approach computationally intensive time-consuming, does weak starting about preferences. principle, predilections tolerance limits determined traits inherit so climatic can be studied it were phenotype (Pearman Rolland This implies phylogenetic position might supplement scarce data related typically comes species. Starting from assumption, we propose ENphylo, new algorithm able fast accurate combining Ecological Niche Factor Analysis (ENFA, Hirzel 2002) imputation (Garland Ives, 2000). To test ENphylo performance, extremely under-sampled 10 20 occurrences, respectively) compared its predictive ENFA ensembles small (ESM, Breiner 2018) approaches. applying ESM, included three techniques: Maxent, Random Forest (RF) Generalized Linear Models (GLM). We consistently outperforms both ESM occurrences. At performs best, yet fails some 15% still provides decent performance. Crucially, results sampling mere record individual well Maxent using full set implemented previous study. ENphylo's workflow includes consecutive steps. first, embodied R function ENphylo_modeling, formats input occurrence/background points tree), calibrates imputation, evaluates accuracy. second function, ENphylo_prediction, relies output ENphylo_modeling predict marginality, specialization habitat suitability dataset provided user generate spatially explicit predictions). ENphylo_prediction embedded single package, named RRdtn, made available part current steps involved functions described following paragraphs. takes objects inputs: (i) presence/background all analysis list (argument input_data), (ii) tree present input_data (iii) mask defining spatial domain encompassing background area enclosing input_mask). For each input_data, relevant should frame object (as handled ‘sp’ ‘sf’ packages) must include column binary format (1 presences, 0 background), columns coordinates (in case form objects), one variable process. Environmental predictors tree. Optionally, specifying age (which useful when species). Newick Nexus format. RasterLayer object. contains internal functions: DATA_PREPARATION, ENFA_CALIBRATION IMPUTED_CALIBRATION. DATA_PREPARATION matches input, rearranging them proper way subsequent package ‘CENFA’ (Rinnan, 2021) compute marginality factors having above minimum threshold >50) min_occ_enfa). Marginality core modelling. thought distance centroid occupied calculated (Rinnan Lawler, Specialization ratio variance Occupied habitats defined given expected show degree multicollinearity. Under ENFA, multicollinearity accounted means factor analysis, performed extract linear combinations maximizing focal eigenvectors. n x m matrix marginality/specialization coefficients rows eigenvectors columns, CO matrix, represents amount (Hirzel 2002). first eigenvector following, orthogonal represent specialization. reduced dropping accounting little variance, according broken-stick criterion (Jackson, 1993). calculates splitting into 80%–20% training/testing samples (the split percentage indicated boot_test_perc argument) calibration–evaluation models. Specifically, (CO80) obtained calibrating 80% (training) entire through row per product × multiplied values g dataset, equal geographic cells data. predicted converted Mahalanobis distances barycenter axes multivariate space (Fonderflick 2015; Hengl 2009; Préau step accommodate residual collinearity among (Calenge convert 0–1 range. latter operation squared approximate Chi-square (Clark remaining 20% (testing) eventually procedure repeated times regulated boot_rep argument), relying default multi-core parallel processing. Predictive assessed discrimination-, reliability-, similarity-based evaluation metrics (Leroy namely operating characteristic curve (AUC; Fielding Bell, 1997), true skill statistic (TSS; Allouche 2006), continuous Boyce index (CBI; 2006) Sørensen similarity (SSI; Leroy Li Guo, 2013). Evaluation PresenceAbsence (Freeman Moisen, 2008) ecospat (Broennimann, packages. Omission error rate further supplied evaluate incidence false positives, recommended AUC/TSS discrimination 2012). taxa INPUT_DATA either lower min_occ_enfa, reports below reference level argument eval_metric_threshold), subset I matrices unavailable reliable, respectively. them, IMPUTED_CALIBRATION, will automatically estimate imputation. Imputation Brownian motion evolution sensible I. phylopars Rphylopars (Goolsby 2017). account uncertainty, alternative trees created altering topology branch lengths original swapONE RRphylo (Castiglione phylogenies create 100 default), proportion tips whose topologic arrangement swapped, nodes changed. keep axes—hence CO—for I, dimensionality imputed forced median retained Phylogenetically swapping iterations bootstrap cross-validation, done ENFA-modelled No computed represented less Once accuracies over replicates, returns outputs strategies specified “output_options” argument: (“output_options” = “full”); achieving user-defined “weighted.mean”), specific metric (“eval_metric_for_imputation”) relative score (“eval_threshold”); corresponding scores iteration “best”). Eventually, forming always min_occ_enfa whereas > validation methods (ENFA versus imputation) performing best retained. “newdata” newdata time frame, both, calibration object, recognizes whether was case, retrieves specification. particular, “full” “best” outputs, best-performing predictions. By “weighted.mean” used, generating performance certain value (AUC 0.7, default) average selected calculated. (provided argument). convert_to_suitability. achieved other state-of-the-art algorithms, built (Phillips RF (Breiman, 2001) GLM (McCullagh Nelder, 1983), posing would outperform competing low-sampling achieve good absolute aim, gathered 21 extant large mammals Eurasia during 200 ka al. (2021). Occurrence 4651 mammal distributed 916 layers. along 10,000 (for details, see Each datapoint temporally associated (depending layer) vector retrieved paleoclimate emulator Holden (2019), six non-collinear (Mondanaro 2021): BIO4 (temperature seasonality), BIO8 (mean temperature wettest quarter), BIO (Mean Temperature Warmest Quarter), BIO13 (precipitation month), BIO14 driest month) BIO18 warmest quarter). Along data, constructed tree.merger 2022). combines synthetic time-calibrated Here, source published Carotenuto (2016) Castiglione (2021), welcome since comparing study directly got study, used. As preliminary step, 31 record. Since initially modelled ENFA. evaluated performances these randomly an training testing calculating AUC, TSS, CBI Index. times, scores. From models, ability observed sub-sampled pool, datapoints kept external evaluation). iterated changing iteration. 10-occurrence-wide subsets (see below), underwent round cross-validation scheme evaluation) set. phase 30 phenotypic phylogenies, 50% specifications. generated implemented, “full”, “weighted.mean”, were, respectively, selecting AUC; ‘best.tree’, hereafter) AUC 0.7 (‘selected.tree’, hereafter), selection. optimally tune followed (2018). 15 predictor variables), varied parameters complexity settings chose configuration yielded highest procedure. shown compromise computational there overlap presence (Valavi easily met our historical GLMs, tested shape relationship linear, quadratic cubic) interaction (absent present). ENMeval regularization 0.5 4, feature classes turn: + quadratic, hinge, hinge (Muscarella Among 48 resulting combinations, reporting lowest Akaike corrected (AICc; Warren Seifert, RF, adopted options reported biomod2 scheme, procedure). dropped poorly calibrated < 0.7) analyses. Ensemble GLM, projections respective (Marmion Both procedures carried out SSI values. subsampling 20-occurrence datasets. Significant differences approaches fitting random-slope mixed (LMM), variable, random effect. experiment sensitive imputed. 3 (9.7%), 28 remaining. after swapped trees. repeating value. Lastly, phylogenies. selecting, turn 6 (19.4%) 9 (29%) significant intensity scenarios, fitted LMM, scenarios Overall, >446.000 replicates levels intensity, ESM). Using 10-occurrence datasets, >50% acceptable 0.75; Elith, 2000), strategy showing well-performing (58%). percentages 29% 3% ESM. proved four metrics, mean averaged 0.75 (0.58–0.88), TSS 0.39 (0.23–0.57), 0.56 (0.31–0.74), 0.68; 0.56–0.78). resulted systematically least (Table 1). LMMs significantly higher any algorithm. finding remains though statistically only (Figure CBI, while significant. When 70% “best.” reached 0.4. emerged 0.79; 0.50–0.92) together 0.60–0.90), 0.77; 0.00–0.99), 0.50; 0.30–0.68) 0.63–0.84; Table 1, S1). values, difference against SSI, pertains outperforming opposite overall, report his algorithm, 2, Figures S1 S2). no term 2) indicating 30% trade-off need accurately inherent difficulty known ‘rare-species paradox’ colleagues overcome based performances. (2015) approach, ‘ensembles models’ (ESM), standard covariate sets pool 107 ranging 140 2015). They found transferability, especially paper, 10–25 However, how much appropriate, terms, pursue goal rarest never tested, effectively addresses fraught unrepresentative tackle issues, method rationale calculate routinely well-sampled rely relatedness derive sampled sophisticated progressively replaced mainstream (but e.g. Andersen Cartledge Mugo Sutton 2021), >200 papers publication 2002 (according Scopus database January 2022), describe preferences, vulnerability global (Cordier Melchionna Rinnan intuitively translated biologically meaningful concepts, width depend biological such thermal limits, body fat metabolism, phenotypes makes effects potential Standing combined produce randomization experiments extreme otherwise dense accuracy, becoming shallower increases. Standard ‘learn’ data; hence, variability projected (Liu 2022; Qiao 2019), limit circumvents providing maps differing (Figures 4). 21–23 (>0.7) depending selected. figure 2021, 22, one-fifth one-tenth better fades away rises 2 3), approached slightly (although significant). nearly sixth 2), ‘best’ strategy, stable 2). key point concern stands, appropriate conditions. negligible even misleading inferring (Münkemüller derived are. contrast, positions branching altered (100 maximize uncertainty. relaxes assumption (intrinsic ‘correct’ relieves branches meant extend observational available. demonstrate dealing necessarily nature preservation Alessandro Mondanaro, Mirko Di Pasquale conceived Silvia wrote codes ran All authors contributed equally writing text, preparing figures. grateful Associate Editor, Dr. Tim Lucas, anonymous reviewer precious advice kindly earlier version manuscript. declare conflict interest. peer review history article https://www.webofscience.com/api/gateway/wos/peer-review/10.1111/2041-210X.14066. raw analyses, via permanent GitHub link at: https://github.com/pasraia/RRdtn package) Zenodo https://zenodo.org/badge/latestdoi/588226643 (Raia, 2023; code reproduce experiments). S1. statistics (up) (down) points. Figure Contour plots Index (CBI) metrics) 10-occurrences (10points) 20-occurrences (20points) strategies. vertical AUC) horizontal TSS) solid lines commonly held judge S2. Soerensen (SSI) Please note: publisher responsible content functionality supporting authors. Any queries (other missing content) directed author article.
منابع مشابه
All - Termination ( SCP )
We recently introduced the All-Termination(T ) problem: given a termination solver T and a function F , find every subset of the formal parameters to F whose consideration is sufficient to show, using T , that F terminates. These subsets can be harnessed by a theorem prover to locate and justify induction schemes, and are also useful for guiding rewriting heuristics and ensuring their terminati...
متن کاملA new SCP-ECG module for telemedicine services
This paper presents the design and development of an application on Electrocardiogram (ECG) management area. This application makes possible the receiving, processing, storing, and retrieving ECG data. The ECG data are recorded with digital ECG devices (carts) constructed by various manufacturers and are transmitted to the telemedicine services workstations (hosts). In order to overcome the inc...
متن کاملImpact of SCP-2/SCP-x gene ablation and dietary cholesterol on hepatic lipid accumulation.
While a high-cholesterol diet induces hepatic steatosis, the role of intracellular sterol carrier protein-2/sterol carrier protein-x (SCP-2/SCP-x) proteins is unknown. We hypothesized that ablating SCP-2/SCP-x [double knockout (DKO)] would impact hepatic lipids (cholesterol and cholesteryl ester), especially in high-cholesterol-fed mice. DKO did not alter food consumption, and body weight (BW) ...
متن کاملThree-dimensional structure/function analysis of SCP-2-like2 reveals differences among SCP-2 family members.
Mosquito sterol carrier protein-2 (AeSCP-2) and sterol carrier protein-2-like2 (AeSCP-2L2) are members of the SCP-2 protein family with similar expression profiles in the mosquito life cycle. In an effort to understand how lipids can be transported by different SCP-2 proteins, the three-dimensional crystal structure of AeSCP-2L2 was solved at 1.7 A resolution. AeSCP-2L2 forms a dimer and binds ...
متن کاملResponse to Koch: Elaborations on the SCP hypothesis.
We appreciate the excellent questions raised by Koch [1] in response to our article [2] and the opportunity to elaborate on our ideas further here. Koch asked whether we propose that it is the existence of the SCP per se or the activity of associated pyramidal cells that is crucial for consciousness. We noted in the legend of Fig. 3a in the original paper: ‘We propose that long-lasting synaptic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Methods in Ecology and Evolution
سال: 2023
ISSN: ['2041-210X']
DOI: https://doi.org/10.1111/2041-210x.14066